103 research outputs found

    k-NNN: Nearest Neighbors of Neighbors for Anomaly Detection

    Full text link
    Anomaly detection aims at identifying images that deviate significantly from the norm. We focus on algorithms that embed the normal training examples in space and when given a test image, detect anomalies based on the features distance to the k-nearest training neighbors. We propose a new operator that takes into account the varying structure & importance of the features in the embedding space. Interestingly, this is done by taking into account not only the nearest neighbors, but also the neighbors of these neighbors (k-NNN). We show that by simply replacing the nearest neighbor component in existing algorithms by our k-NNN operator, while leaving the rest of the algorithms untouched, each algorithms own results are improved. This is the case both for common homogeneous datasets, such as flowers or nuts of a specific type, as well as for more diverse dataset

    CLID: Controlled-Length Image Descriptions with Limited Data

    Full text link
    Controllable image captioning models generate human-like image descriptions, enabling some kind of control over the generated captions. This paper focuses on controlling the caption length, i.e. a short and concise description or a long and detailed one. Since existing image captioning datasets contain mostly short captions, generating long captions is challenging. To address the shortage of long training examples, we propose to enrich the dataset with varying-length self-generated captions. These, however, might be of varying quality and are thus unsuitable for conventional training. We introduce a novel training strategy that selects the data points to be used at different times during the training. Our method dramatically improves the length-control abilities, while exhibiting SoTA performance in terms of caption quality. Our approach is general and is shown to be applicable also to paragraph generation

    Is Image Memorability Prediction Solved?

    Full text link
    This paper deals with the prediction of the memorability of a given image. We start by proposing an algorithm that reaches human-level performance on the LaMem dataset - the only large scale benchmark for memorability prediction. The suggested algorithm is based on three observations we make regarding convolutional neural networks (CNNs) that affect memorability prediction. Having reached human-level performance we were humbled, and asked ourselves whether indeed we have resolved memorability prediction - and answered this question in the negative. We studied a few factors and made some recommendations that should be taken into account when designing the next benchmark

    CloudWalker: Random walks for 3D point cloud shape analysis

    Full text link
    Point clouds are gaining prominence as a method for representing 3D shapes, but their irregular structure poses a challenge for deep learning methods. In this paper we propose CloudWalker, a novel method for learning 3D shapes using random walks. Previous works attempt to adapt Convolutional Neural Networks (CNNs) or impose a grid or mesh structure to 3D point clouds. This work presents a different approach for representing and learning the shape from a given point set. The key idea is to impose structure on the point set by multiple random walks through the cloud for exploring different regions of the 3D object. Then we learn a per-point and per-walk representation and aggregate multiple walk predictions at inference. Our approach achieves state-of-the-art results for two 3D shape analysis tasks: classification and retrieval
    • …
    corecore